dynamic pricing
Contextual Dynamic Pricing with Heterogeneous Buyers
We initiate the study of contextual dynamic pricing with a heterogeneous population of buyers, where a seller repeatedly posts prices (over T rounds) that depend on the observable d-dimensional context and receives binary purchase feedback. Unlike prior work assuming homogeneous buyer types, in our setting the buyer's valuation type is drawn from an unknown distribution with finite support size K . We develop a contextual pricing algorithm based on optimistic posterior sampling with regret eO(K dT), which we prove to be tight in dand T up to logarithmic terms. Finally, we refine our analysis for the non-contextual pricing case, proposing a variance-aware zooming algorithm that achieves the optimal dependence on K .
Transfer Faster, Price Smarter: Minimax Dynamic Pricing under Cross-Market Preference Shift
We study contextual dynamic pricing when a target market can leverage Kauxiliary markets--offline logs or concurrent streams--whose mean utilities differ by a structured preference shift. We propose Cross-Market Transfer Dynamic Pricing (CM-TDP), the first algorithm that provably handles such model-shift transfer and delivers minimax-optimal regret for both linear and nonparametric utility models. For linear utilities of dimension d, where the difference between source-and targettask coefficients is s0-sparse, CM-TDP attains regret eO (dK 1 + s0) log T .
Optimal Contextual Pricing under Agnostic Non-Lipschitz Demand
We study contextual dynamic pricing with linear valuations and bounded-support agnostic noise, whose induced demand curve may be non-Lipschitz with arbitrary jumps and atoms. Such discontinuities break the cross-context interpolation arguments used by smooth-demand pricing algorithms, while the best previous method achieved only $\tilde O(T^{3/4})$ regret. We propose Conservative-Markdown Redirect-UCB Pricing, a polynomial-time algorithm that combines randomized parameter estimation, conservative residual-grid probing, and confidence-based one-step redirection. Our algorithm achieves $\tilde O(T^{2/3})$ optimal regret, matching the known lower bounds of Kleinberg and Leighton (2003) up to logarithmic factors and improving over the previous upper bound of Xu and Wang (2022). Under stochastic well-conditioned contexts, this closes the long-existing open regret gap in linear-valuation contextual pricing under agnostic non-Lipschitz noise distribution.
Dynamic pricing and assortment under a contextual MNL demand
We consider dynamic multi-product pricing and assortment problems under an unknown demand over T periods, where in each period, the seller decides on the price for each product or the assortment of products to offer to a customer who chooses according to an unknown Multinomial Logit Model (MNL). Such problems arise in many applications, including online retail and advertising. We propose a randomized dynamic pricing policy based on a variant of the Online Newton Step algorithm (ONS) that achieves a O(d T log(T))regret guarantee under an adversarial arrival model. We also present a new optimistic algorithm for the adversarial MNL contextual bandits problem, which achieves a better dependency than the state-of-the-art algorithms in a problem-dependent constant ฮบ2 (potentially exponentially small). Our regret upper bound scales as O(d ฮบ2T +log(T)/ฮบ2), which gives a stronger bound than the existing O(d T/ฮบ2)guarantees.
Dynamic pricing and assortment under a contextual MNL demand
We consider dynamic multi-product pricing and assortment problems under an unknown demand over T periods, where in each period, the seller decides on the price for each product or the assortment of products to offer to a customer who chooses according to an unknown Multinomial Logit Model (MNL). Such problems arise in many applications, including online retail and advertising. We propose a randomized dynamic pricing policy based on a variant of the Online Newton Step algorithm (ONS) that achieves a O(d T log(T))regret guarantee under an adversarial arrival model. We also present a new optimistic algorithm for the adversarial MNL contextual bandits problem, which achieves a better dependency than the state-of-the-art algorithms in a problem-dependent constant ฮบ2 (potentially exponentially small). Our regret upper bound scales as O(d ฮบ2T +log(T)/ฮบ2), which gives a stronger bound than the existing O(d T/ฮบ2)guarantees.